23 research outputs found
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
Conventional reinforcement learning (RL) needs an environment to collect
fresh data, which is impractical when online interactions are costly. Offline
RL provides an alternative solution by directly learning from the previously
collected dataset. However, it will yield unsatisfactory performance if the
quality of the offline datasets is poor. In this paper, we consider an
offline-to-online setting where the agent is first learned from the offline
dataset and then trained online, and propose a framework called Adaptive Policy
Learning for effectively taking advantage of offline and online data.
Specifically, we explicitly consider the difference between the online and
offline data and apply an adaptive update scheme accordingly, that is, a
pessimistic update strategy for the offline dataset and an optimistic/greedy
update scheme for the online dataset. Such a simple and effective method
provides a way to mix the offline and online RL and achieve the best of both
worlds. We further provide two detailed algorithms for implementing the
framework through embedding value or policy-based RL algorithms into it.
Finally, we conduct extensive experiments on popular continuous control tasks,
and results show that our algorithm can learn the expert policy with high
sample efficiency even when the quality of offline dataset is poor, e.g.,
random dataset.Comment: AAAI202
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
In long context scenarios, large language models (LLMs) face three main
challenges: higher computational/financial cost, longer latency, and inferior
performance. Some studies reveal that the performance of LLMs depends on both
the density and the position of the key information (question relevant) in the
input prompt. Inspired by these findings, we propose LongLLMLingua for prompt
compression towards improving LLMs' perception of the key information to
simultaneously address the three challenges. We conduct evaluation on a wide
range of long context scenarios including single-/multi-document QA, few-shot
learning, summarization, synthetic tasks, and code completion. The experimental
results show that LongLLMLingua compressed prompt can derive higher performance
with much less cost. The latency of the end-to-end system is also reduced. For
example, on NaturalQuestions benchmark, LongLLMLingua gains a performance boost
of up to 17.1% over the original prompt with ~4x fewer tokens as input to
GPT-3.5-Turbo. It can derive cost savings of \$28.5 and \$27.4 per 1,000
samples from the LongBench and ZeroScrolls benchmark, respectively.
Additionally, when compressing prompts of ~10k tokens at a compression rate of
2x-10x, LongLLMLingua can speed up the end-to-end latency by 1.4x-3.8x. Our
code is available at https://aka.ms/LLMLingua
Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling
A timely detection of seizures for newborn infants with electroencephalogram
(EEG) has been a common yet life-saving practice in the Neonatal Intensive Care
Unit (NICU). However, it requires great human efforts for real-time monitoring,
which calls for automated solutions to neonatal seizure detection. Moreover,
the current automated methods focusing on adult epilepsy monitoring often fail
due to (i) dynamic seizure onset location in human brains; (ii) different
montages on neonates and (iii) huge distribution shift among different
subjects. In this paper, we propose a deep learning framework, namely STATENet,
to address the exclusive challenges with exquisite designs at the temporal,
spatial and model levels. The experiments over the real-world large-scale
neonatal EEG dataset illustrate that our framework achieves significantly
better seizure detection performance.Comment: Accepted in IEEE International Conference on Systems, Man, and
Cybernetics (SMC) 202
Insight-HXMT on-orbit thermal control status and thermal deformation impact analysis
Purpose: The Hard X-ray Modulation Telescope is China's first X-ray astronomy
satellite launched on June 15th, 2017, dubbed Insight-HXMT. Active and passive
thermal control measures are employed to keep devices at suitable temperatures.
In this paper, we analyzed the on-orbit thermal monitoring data of the first 5
years and investigated the effect of thermal deformation on the point spread
function (PSF) of the telescopes.
Methods: We examined the data of the on-orbit temperatures measured using 157
thermistors placed on the collimators, detectors and their support structures
and compared the results with the thermal control requirements. The thermal
deformation was evaluated by the relative orientation of the two star sensors
installed on the main support structure. its effect was estimated with
evolution of the PSF obtained with calibration scanning observations of the
Crab nebula.
Conclusion: The on-orbit temperatures met the thermal control requirements
thus far, and the effect of thermal deformation on the PSF was negligible after
the on-orbit pointing calibration.Comment: 25 pages, 35 figures, submitte
Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite
As China's first X-ray astronomical satellite, the Hard X-ray Modulation
Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15,
2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy
satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was
designed to perform pointing, scanning and gamma-ray burst (GRB) observations
and, based on the Direct Demodulation Method (DDM), the image of the scanned
sky region can be reconstructed. Here we give an overview of the mission and
its progresses, including payload, core sciences, ground calibration/facility,
ground segment, data archive, software, in-orbit performance, calibration,
background model, observations and some preliminary results.Comment: 29 pages, 40 figures, 6 tables, to appear in Sci. China-Phys. Mech.
Astron. arXiv admin note: text overlap with arXiv:1910.0443
Insight-HXMT observations of Swift J0243.6+6124 during its 2017-2018 outburst
The recently discovered neutron star transient Swift J0243.6+6124 has been
monitored by {\it the Hard X-ray Modulation Telescope} ({\it Insight-\rm HXMT).
Based on the obtained data, we investigate the broadband spectrum of the source
throughout the outburst. We estimate the broadband flux of the source and
search for possible cyclotron line in the broadband spectrum. No evidence of
line-like features is, however, found up to . In the absence of
any cyclotron line in its energy spectrum, we estimate the magnetic field of
the source based on the observed spin evolution of the neutron star by applying
two accretion torque models. In both cases, we get consistent results with
, and peak luminosity of which makes the source the first Galactic ultraluminous
X-ray source hosting a neutron star.Comment: publishe
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning
We propose VRL3, a powerful data-driven framework with a simple design for
solving challenging visual deep reinforcement learning (DRL) tasks. We analyze
a number of major obstacles in taking a data-driven approach, and present a
suite of design principles, novel findings, and critical insights about
data-driven visual DRL. Our framework has three stages: in stage 1, we leverage
non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations;
in stage 2, we use offline RL data (e.g. a limited number of expert
demonstrations) to convert the task-agnostic representations into more powerful
task-specific representations; in stage 3, we fine-tune the agent with online
RL. On a set of challenging hand manipulation tasks with sparse reward and
realistic visual inputs, compared to the previous SOTA, VRL3 achieves an
average of 780% better sample efficiency. And on the hardest task, VRL3 is
1220% more sample efficient (2440% when using a wider encoder) and solves the
task with only 10% of the computation. These significant results clearly
demonstrate the great potential of data-driven deep reinforcement learning.Comment: 41 pages, under camera-ready revision, accepted to NeurIPS 202